Low-Rank Doubly Stochastic Matrix Decomposition for Cluster Analysis
نویسندگان
چکیده
Cluster analysis by nonnegative low-rank approximations has experienced a remarkable progress in the past decade. However, the majority of such approximation approaches are still restricted to nonnegative matrix factorization (NMF) and suffer from the following two drawbacks: 1) they are unable to produce balanced partitions for large-scale manifold data which are common in real-world clustering tasks; 2) most existing NMF-type clustering methods cannot automatically determine the number of clusters. We propose a new low-rank learning method to address these two problems, which is beyond matrix factorization. Our method approximately decomposes a sparse input similarity in a normalized way and its objective can be used to learn both cluster assignments and the number of clusters. For efficient optimization, we use a relaxed formulation based on Data-Cluster-Data random walk, which is also shown to be equivalent to low-rank factorization of the doublystochastically normalized cluster incidence matrix. The probabilistic cluster assignments can thus be learned with a multiplicative majorization-minimization algorithm. Experimental results show that the new method is more accurate both in terms of clustering large-scale manifold data sets and of selecting the number of clusters.
منابع مشابه
Improving cluster analysis by co-initializations
Many modern clustering methods employ a non-convex objective function and use iterative optimization algorithms to find local minima. Thus initialization of the algorithms is very important. Conventionally the starting guess of the iterations is randomly chosen; however, such a simple initialization often leads to poor clusterings. Here we propose a new method to improve cluster analysis by com...
متن کاملFast and Accurate Low Rank Approximation of Massive Graphs
In this paper we present a fast and accurate procedure called clustered low rank matrix approximation for massive graphs. The procedure involves a fast clustering of the graph and then approximating each cluster separately using existing methods, e.g. the singular value decomposition, or stochastic algorithms. The cluster-wise approximations are then extended to approximate the entire graph. Th...
متن کاملClustered low rank approximation of graphs in information science applications
In this paper we present a fast and accurate procedure called clustered low rank matrix approximation for massive graphs. The procedure involves a fast clustering of the graph and then approximates each cluster separately using existing methods, e.g. the singular value decomposition, or stochastic algorithms. The cluster-wise approximations are then extended to approximate the entire graph. Thi...
متن کاملSome results on the symmetric doubly stochastic inverse eigenvalue problem
The symmetric doubly stochastic inverse eigenvalue problem (hereafter SDIEP) is to determine the necessary and sufficient conditions for an $n$-tuple $sigma=(1,lambda_{2},lambda_{3},ldots,lambda_{n})in mathbb{R}^{n}$ with $|lambda_{i}|leq 1,~i=1,2,ldots,n$, to be the spectrum of an $ntimes n$ symmetric doubly stochastic matrix $A$. If there exists an $ntimes n$ symmetric doubly stochastic ...
متن کاملStochastic bounds with a low rank decomposition
We investigate how we can bound a Discrete Time Markov Chain (DTMC) by a stochastic matrix with a low rank decomposition. In the first part of the paper we show the links with previous results for matrices with a decomposition of size 1 or 2. Then we show how the complexity of the analysis for steady-state and transient distributions can be simplified when we take into account the decomposition...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 17 شماره
صفحات -
تاریخ انتشار 2016